In this workshop, the aim is to cover some basics of using variables and vectors in R, as well as a start on using strings. We will be covering:
Note to self: move packages to next workshop, just cover how to use functions here. Also move lists back and cover basic vector indexing (which can be more in depth later with data frames)
We will be working in pairs:
What to do when getting stuck:
To get feedback: hand in your R markdown exercise file in the assignment on the Teams channel for the R 1 workshop.
A vector is a set of information contained together in a specific order.
To make a vector you combine variables using the c function (more on functions later); also known as concatenation. To call the c function we use brackets () with the numbers we want separated by a comma.
The first way of making a vector is to add the arguments (numbers) you want.
## [1] 1 6 19 4 9
We can also combine predefined variables and vectors together to create a new vector.
## [1] 1 6 19 4 9 22 7 30
Another way of making a vector is using the colon (:), which can be done without the c function. We can tell R to select a sequence of integers from x to y, or 5 through to 10 in our example.
## [1] 5 6 7 8 9 10
We can also do some basic calculations on vectors. These occur elementwise (one element at a time).
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
As you can see this divides all elements in the vector by 5.
A function is code organised together to perform a specific task. The function will take in an input, perform a task, then return an output. They are the backbone of R, which comes built in with a wide array of functions.
The function(input) format the fundamental way to call and use a function in R. function is the name of the function we are using, input is the argument or data we are passing to the function.
For example:
# running times (mins)
runTimes <- c(31, 50, 15, 19, 23, 34, 9)
# mean running time
meanRun <- mean(runTimes)
meanRun## [1] 25.85714
# tidy up result
meanRun <- round(meanRun, digits = 2)
# print nice result
paste0("Your mean running time is: ", meanRun, " minutes")## [1] "Your mean running time is: 25.86 minutes"
Here we are using the functions c, round, mean, and paste0. We will be using these in our exercises today.
We are on a walking exercise plan, where we increase our step count by a thousand each day, starting at 1000 steps and ending on 12000.
seq function that increases steps from 1000 to 12000 by increments of 1000Indexing is a technical term for accessing elements of a vector. Think of it like selecting books from a book shelf. The vector is your book shelf, you are the index picking what book, or books, you want to read.
Designed by macrovector / Freepik
To index in R you use the square brackets [] after you type the name of the vector to index from. You then put the elements you want to index in the square brackets.
Some examples:
## [1] 9
Indexing elements 1 to 4
## [1] 4 26 11 15
Dropping elements 5 to 7
## [1] 4 26 11 15 1
Indexing 1, 5, and 8
## [1] 4 18 1
If you try and index outside of the vectors range you get an NA. A way of checking is using the length function. Our vector has 8 elements, but we tried to call a 9th.
## [1] NA
## [1] 8
Using indexing you can change the value of an item, or items, in a vector.
## [1] 4 26 11 15 18 9 3 50
## [1] 19 20 21 15 18 9 3 50
You decided to track your total monthly expenditures for the year to find out more about your monthly spending. Such as spending per quarter, biggest spending month, and lowest spending month.
which.max() and which.min() functions, find out which months had the highest and lowest spending.So far we have only been working with numbers and integers. Strings are text based data which R calls characters.
To code a string you need to use quotation marks. You can use either single or double, depending on your preference. When printing the result, R will always use double quotation marks.
## [1] "Oak" "Willow" "Redwood"
You have to be careful not to run functions on strings that need numerical data.
Using the notNumbers variable defined here:
You can find out what type data your variable/vector is using the class function.
## [1] "character"
## [1] "numeric"
R comes with several useful functions for manipulating strings, these include paste, paste0, grep, gsub. paste and paste0 are for creating strings, and grep and gsub for are string matching and replacement.
Some examples of paste:
## [1] "Oak, Willow, Redwood"
## [1] "Hi, there"
## [1] "This week I ate 4 pizzas..."
As you can see paste can make new strings from existing strings.
paste to make this string: “sunflower, poppy, dahlia”paste0 make the following sentence that used the daysRaining variable: “It has been raining for 360 days this year”We use grep for string matching. We give it the string or part of string we are looking for, and it will return where in the vector these strings are.
## [1] "Hampshire" "Hampshire" "London" "London" "London" "London"
## [7] "London" "Kent" "Surrey" "Surrey" "Surrey"
## [1] 3 4 5 6 7
It is very useful for indexing strings.
## [1] "London" "London" "London" "London" "London"
With gsub we give the function the patten we are looking to replace, what to replace it with, and the variable or vector to work on.
## [1] "drew" "drea" "gela"
Here we are removing the An from the names in the Names vector.
For this exercise I have given you the code but it is in the wrong order. You need to re-arrange the code to it runs correctly. Comment on what each line of code is doing.
The end result you are aiming for is: “These 4 pokemon have ‘ar’ in their names: Charmander, Charmeleon, Charizard, Wartortle”
pokemon <- gsub("[0-9]", "", pokemon)
paste0("These ",arPokes_num, " pokemon have 'ar' in their names: ", arPokes)
arPokes_num <- sum(length(arPokes))
arPokes <- pokemon[grepl("ar", pokemon)]
pokemon <- c("Bulbasaur001", "Ivysaur002", "Venusaur003",
"Charmander004", "Charmeleon005", "Charizard006",
"Squirtle007", "Wartortle008", "Blastoise009")
arPokes <- paste(arPokes, collapse = ", ")If you attended the first R workshop you might remember we calculated a students weighted average grade. Convert this to incorporate 10 students instead of just the one.
exam1 <- c(52, 62, 55, 82, 48, 65, 68, 62, 65, 65)
coursework1 <- c(72, 72, 85, 52, 78, 62, 65, 52, 55, 68)
exam2 <- c(62, 72, 58, 52, 68, 75, 62, 65, 62, 88)
coursework2 <- c(72, 62, 65, 62, 78, 45, 78, 65, 55, 75)
cw_weight <- 0.4
ex_weight <- 0.6
course1 <- (exam1 * ex_weight) + (coursework1 * cw_weight)
course2 <- (exam2 * ex_weight) + (coursework2 * cw_weight)
overall_grade <- (course1 + course2)/2
overall_grade## [1] 63.0 67.0 63.9 63.0 66.0 63.4 67.6 61.5 60.1 74.5
# vector of places people are from
places <- c(rep("Hampshire", 2), rep("London", 5), rep("Kent", 1), rep("Surrey", 3))
# counting how many people from each place
table(places)## places
## Hampshire Kent London Surrey
## 2 1 5 3